Search results for "020602 bioinformatics"
showing 10 items of 28 documents
Efficient Algorithms for Sequence Analysis with Entropic Profiles
2017
Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign …
Discriminating graph pattern mining from gene expression data
2016
We consider the problem of mining gene expression data in order to single out interesting features that characterize healthy/unhealthy samples of an input dataset. We present and approach based on a network model of the input gene expression data, where there is a labelled graph for each sample. To the best of our knowledge, this is the first attempt to build a different graph for each sample and, then, to have a database of graphs for representing a sample set. Out main goal is that of singling out interesting differences between healthy and unhealthy samples, through the extraction of "discriminating patterns" among graphs belonging to the two different sample sets. Differently from the …
Applying Conceptual Modeling to Better Understand the Human Genome
2016
The objective of the work is to present the benefits of the application of Conceptual Modeling (CM) in complex domains, such as genomics. This paper explains the evolution of a Conceptual Schema of the Human Genome (CSHG), which seeks to provide a clear and precise understanding of the human genome. We want to highlighting all the advantages of the application of CM in a complex domain such as Genomic Information Systems (GeIS). We show how over time this model has evolved, thus we have discovered better forms of representation. As we advanced in exploring the domain, we understood that we should be extending and incorporating the new concepts detected into our model. Here we present and di…
Parallel Pairwise Epistasis Detection on Heterogeneous Computing Architectures
2016
This is a post-peer-review, pre-copyedit version of an article published in IEEE Transactions on Parallel and Distributed Systems. The final authenticated version is available online at: http://dx.doi.org/10.1109/TPDS.2015.2460247. [Abstract] Development of new methods to detect pairwise epistasis, such as SNP-SNP interactions, in Genome-Wide Association Studies is an important task in bioinformatics as they can help to explain genetic influences on diseases. As these studies are time consuming operations, some tools exploit the characteristics of different hardware accelerators (such as GPUs and Xeon Phi coprocessors) to reduce the runtime. Nevertheless, all these approaches are not able t…
The colored longest common prefix array computed via sequential scans
2018
Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-exter…
Search for a Minimal Set of Parameters by Assessing the Total Optimization Potential for a Dynamic Model of a Biochemical Network.
2017
Selecting an efficient small set of adjustable parameters to improve metabolic features of an organism is important for a reduction of implementation costs and risks of unpredicted side effects. In practice, to avoid the analysis of a huge combinatorial space for the possible sets of adjustable parameters, experience-, and intuition-based subsets of parameters are often chosen, possibly leaving some interesting counter-intuitive combinations of parameters unrevealed. The combinatorial scan of possible adjustable parameter combinations at the model optimization level is possible; however, the number of analyzed combinations is still limited. The total optimization potential (TOP) approach is…
Discovering discriminative graph patterns from gene expression data
2016
We consider the problem of mining gene expression data in order to single out interesting features characterizing healthy/unhealthy samples of an input dataset. We present an approach based on a network model of the input gene expression data, where there is a labelled graph for each sample. To the best of our knowledge, this is the first attempt to build a different graph for each sample and, then, to have a database of graphs for representing a sample set. Our main goal is that of singling out interesting differences between healthy and unhealthy samples, through the extraction of "discriminative patterns" among graphs belonging to the two different sample sets. Differently from the other…
SpaceScanner: COPASI wrapper for automated management of global stochastic optimization experiments
2017
Abstract Motivation Due to their universal applicability, global stochastic optimization methods are popular for designing improvements of biochemical networks. The drawbacks of global stochastic optimization methods are: (i) no guarantee of finding global optima, (ii) no clear optimization run termination criteria and (iii) no criteria to detect stagnation of an optimization run. The impact of these drawbacks can be partly compensated by manual work that becomes inefficient when the solution space is large due to combinatorial explosion of adjustable parameters or for other reasons. Results SpaceScanner uses parallel optimization runs for automatic termination of optimization tasks in case…
How to deal with Haplotype data: An Extension to the Conceptual Schema of the Human Genome
2016
[EN] The goal of this work is to describe the advantages of the application of Conceptual Modeling (CM) in complex domains, such as genomics. Nowadays, the study and comprehension of the human genome is a major challenge due to its high level of complexity. The constant evolution in the genomic domain contributes to the generation of ever larger amounts of new data, which means that if we do not manage it correctly data quality could be compromised (i.e., problems related with heterogeneity and inconsistent data). In this paper, we propose the use of a Conceptual Schema of the Human Genome (CSHG), designed to understand and improve our ontological commitment to the domain and also extend (e…
Paving the way for synthetic biology-based bioremediation in Europe
2009
Synthetic biology (SB) has a dual definition. It is both the design and construction of new biological parts, devices and systems, and also the re‐design of existing, natural systems for useful purposes. The latter field is maybe one of the major challenges within this discipline, since the promising prospect that biological systems may be used as biomachines will certainly be exploited in the near future. Synthetic biology has challenging conceptual possibilities (Moya et al., 2009a) and impressive progress has already been made in biotechnology following SB approaches (de Lorenzo and Danchin, 2008). Much more is expected in the near future from current efforts aiming to make synthetic gen…